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selected as the most appropriate one for high performance 
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the simulated version of the multicluster loop interface. 
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I. INTRODUCTION 


A. BACKGROUND 


As the computer technology grows rapidly, the complexity 
of computer systems together with hardware and software has 
been increasing. The lowest possible cost, smallest incre- 
mental expansion capability and the demand for enhanced user 


convenience have influenced the trend toward multiprocessor 


systems. 

To enhance throughput, reliability, computing power, 
parallelism, and economies of scale, additional processors 
can be added to some systems. In early multiprocessor 


systems the additional processors had specialized functions, 
e.g., I/O peripherals. Later multiprocessing systems evolved 
to include the concept of one large CPU and several periph- 
eral processors. These processors may perform quite 
sophisticated tasks, such as running a display. А тоге 
common type of multiprocessing is a system having two or 
more processors, each of equal power.  [Ref. 1] 

There is also the computer network, in which many 
different computers are connected to perform repetitive 
functions, often at great distances from one another. They 
typically perform functions by spreading the pieces of the 
function around the total system. 

There are various ways to connect and operate a multi- 
processor system, loop architecture being one of them. The 
primary advantage of loop systems are their relatively low 
cost and high modularity. Different loop types will be 
introduced and especially the Delay Insertion Loop will be 


emphasized. 
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The bit-serial interconnect scheme is attractive since 
it allows the user to interconnect processors made by the 
different manufacturers and often with different internal 
architectures without too much concern for the communica- 
tions impact on the existing system software.  [Ref. 2] 

The Transputer (transistor computer) is a new single 
chip computer which is presented and used as a powerful 
uniprocessor in concurrent multiprocessor systems with a new 


programming language OCCAM. 


B. MOTIVATION ОЕ ТОТ. ТЕЛЬ 


The importance of communication in a multiprocessor 
System increases as the number of the processors in the 
system increases. The need for an effective communication 
for building a cluster of the processors lead us to choose 
loop type and especially the Delay Insertion Loop type of 
Serial communication interface, since it provides efficient 


use of transmission facilities as well as signal transpar- 


ency, expandability (growth flexibility), relatively low 
cost and high modularity. | 

The Transputer is chosen as a powerful component 
processor for building multiprocessor system, so the 


transputer is designed to implement a particular programming 
language, OCCAM, efficiently. OCCAM enables the behaviour of 
Concurrent Systems to be explicitly programmed and 
controlled. The OCCAM language retains the efficiency, in 
terms of program density and performance, of an assembler, 
while offering the productivity and reliability advantages 
of programming in a high level language.  [Ref. 3] 

If we look at some quantitative information about the 
performance and capacity of the transputer, IMS T424 
[Ref. 4], it can be understood why we are motivated for thas 


Work. 


JU 


ТАВҺЕ 1 
CAPABILITIES OF THE IMS T424 


Igucessort ен 3 bits 


processing speed . . . 10 MIPS (950 nanosecond mult.) 


memory capacity . . . . 32 bit address bus 


Built n mémory . 2. . 4 KBytes RAM 

erial bus ш». з 4 INMOS links (1.5 Mbytes/sec) 
earallel bus . . . x 25 Mbytes/sec (max. transfer) 
peripheral interface . 8 bits bidir. (4 Mbytes/sec) 
power dissipation . . . 0.9 Watts 


Emus$cal . . . . s $4 45 mm? chip mounted in an 84 
contact leadless chip carrier. 





Table 1 shows attractive parameter values of the 
transputer as a uniprocessor component to use in a multipro- 
cessing environment. The transputer provides excellent 
hardware for implementing concurrent processing. TESRIS 
designed using a reduced instruction set architecture which 
implements the OCCAM concurrent programming language 
efficiently. [Ref. 5] 

Another important feature of the transputer is the four 
bidirectional serial communication channels. It is possible 
to obtain multiple communication paths between two elements 
of the multitransputer system by appropriately connecting 
the serial links. These multiple paths provide for the 
graceful degradation for redundant multitransputer systems. 
The failed element can be simply by-passed in the multi- 
system and processing continues with other elements of the 


System. 


Ш 


C. “@BIEGTIVES 


This thesis implements a model of a Delay Insertion Loop 
type of serial communication interface for a real-time 
multitransputer system by using the programming language 
OCCAM. A four transputer model will be used to illustrate 
the interface programming. Although the model uses only 
four transputers, any number of transputers may be connected 
together using this type of loop concept. Fault tolerance 
issues and how these issues can be resolved are discussed 


Separately. 


D. THESIS ORGANIZATION 


The introduction just presented is designed to provide 
the reader with a brief look at a multitransputer architec- 
tural concept, the transputer and OCCAM language. 

Chapter II will present the hardware architecture and 
the capabilities of the transputer and multitransputer 
systems. Chapter III will present the OCCAM as a new concur- 
rent processing language and its special features. Chapter 
IV will describe multitransputer systems, multiprocessing, 
interconnect structures and loop typé communication systems. 
Chapter V will implement a four transputer model of a Delay 
Insertion Loop Type of serial communication а овес 

And the final chapter will present conclusions, observa- 
tions that resulted from this thesis effort and suggestions 
for further research. The software program of the system is 


provided as an appendix. 
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II. HARDWARE 


pee TRANSPUTER 


The transputer is a programmable component. The term 
"Transputer" is derived from ‘transistor’ and  'computer', 
since the transputer is both a computer on a chip anda 
Silicon component like a transistor. As a transistor 
computer, it is a single chip computer which provides a 
direct implementation for the process model of computing, in 
which each process is an independent computation with its 
own data and program. The processes are executed in a time 
shared mode on the transputer and special instructions are 
provided to support the process model of communication. A 
transputer iS a microcomputer with its own local memory and 
with link interfaces for connecting one transputer to 


another transputer [Ref. 3]. 


B. WHY TRANSPUTER ? 


There are some problems in the design of concurrent 
Systems. Three apparent ones are 
1. Hardware problems (How to connect the computers). 
2. Programming problems (How to program tens or hundreds 
of connected machines). 
3. Design problems (How to design the system as a whole). 
[Ref. 6] 


IMS T424 32 bit transputer provides an effective solu- 
tion to these problems. The programming problem may be 
solved by the use of an appropriate concurrent programming 


language; OCCAM is recommended. 
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The transputer can be used as a single chip stand-alone 
component or in networks to build high performance concur- 
rent systems. As a stand-alone component, the transputer can 
be programmed in conventional high level languages. It is 
designed to implement a particular concurrent programming 
language, OCCAM, efficiently. 

The transputer allows arbitrarily large systems to be 
constructed using localized processing and communication. 
Its locality is exploited by OCCAM and multitransputer 
Systems can be used effectively. 

The transputer uses point-to-point serial communication 
links, therefore it provides maximum communication speed 
with minimal wiring. Correspondingly, OCCAM uses point-to- 


point channels. 


C. GENERAL FEATURES OF TRANSPUTER 


The main components of the transputer : memory , 
processor, links and peripheral interface will be described 
in the following subsections. [Ref. 3] 

l. Memory 


T424 contains 4 Kbytes of static RAM which cycles 
synchronously with the processor and provides maximum data 
transfer rate of 80 Mbytes/sec, Бірпте ЖР The memory 
interface uses a 32 bit multiplexed data and address bus to 
give high performance access to external memory. ТЕ can 
extend internal address capability to a total of 4 Gbytes in 


a single linear address space. 


A number of preset timing configurations is provided 
to suit a wide variety of memories. All the timing strobes 
are generated for dynamic RAM's as well as the necessary 
refresh cycles. A memory cycle consists of six phases. Each 


refresh cycle outputs a nine-bit refresh address and the 


IRG 





Figure 2.1 Memory Interface Driving Static RAM's 


user can choose the interval between refresh cycles. Ап 
asynchronous wait input is provided so that the memory 
timing can also be determined externally if required. Wait 
states generated by the configurable strobes can extend the 
interface cycle for slow external devices. Wait states 
generated by external logic can extend the memory cycle 


indefinitely. The cycle to access memory is completed in 150 


nano seconds providing data rate of 25 Mbytes/second 
maximum, without requiring the phases for address 
multiplexing. 

D Processor 


The T424 32 bit processor is designed to implement 
high-level languages (e.g. OCCAM, C and Pascal) efficiently 


and to provide high performance communication between 


concurrent processes. Its instruction execution rate is 10 
MIPS. Typical instructions execution times are shown in 
Table 2 | 


ДИ 


TABLE 2 
INSTRUCTION EXECUTION TIMES OF IMS T424 


TN. Sel RU GC Paar EXECUTION TIME 
nano second 
arithmetic operands 


5 
multiplication о з е o 95 
divisioni „ ыы. ToS 
remainder . ZEN I S LEA Lu x 195 

comparison operators 

logical оре о 

Е : 


Np >> in 
ERIS | [n] 
variable 2 
vector variable 
expression evaluation 
constant oun 
parenthesis 
constructor 
sequential 
parallel 
alternative . 
branch F 
repetitive к. 
poumon 
(output) ? (input) 
assignmen ИИ 


shifting 





High-level language expressions are evaluated on an 
evaluation stack of 32-bit registers. The instructions 
specify the registers of the evaluation stack implicitly, 
allowing compact coding of instructions. The correct and 
optimal sequence of these instructions is easy for a 
compiler to generate. Each instruction is one byte long and 
divided into two four bit fields: function and operand) “ШЕ 
is also simple to decode, which contributes to the high 
performance of the processor. High-level language support is 
enhanced with instructions for array bounds checking, arith- 
metic overflow detection and Support for multi-word-length 


arithmetic. 
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The processor provides direct support for the OCCAM 
model of concurrency and communication. It has a scheduler 
which enables any number of concurrent processes to Бе 
executed together, sharing the processor time. Process 
communication is implemented by memory-to-memory block move 
operations. These fully utilize the bandwidth available from 
phe on-chip RAM. The small number of registers which form 
the process context and the use of on-chip RAM combine to 
provide a fast process switch time. 

In concurrent processing, the uniprocessor executes 
programs sequentially. It implements parallel processes by 
Sharing its time between the set of processes which are 
active at any instant. A process is active when it 1s not 
waiting for input or output. When communication happens, the 
currently executing process is set inactive to wait for 
communication and the next process onthe active queue 
Starts to execute. When a communication channel becomes 
ready, the message is passed and the waiting process is 
linked to the end of the active process queue. The process 
ШООГО execute, whenever its turn in the -queue comes 
ир. 

The T424 processor supports two levels of priority. 
High priority processes can be used for message  through- 
КОШЕ пр or for fast response to external events. PRIPAR 
(priority parallel) process may have two components. A queue 
of active processes 1S maintained for each priority level. A 
priority 1 (low priority) process is executed whenever there 
are no active priority 0 (high priority) processes. If there 
are no active priority 0 processes, the latency (time from 
an external channel becoming ready to the start of its first 
instruction of the relevant waiting priority 0 process) is 
typically 600 ns (maximum 2600 ns). Otherwise, if a priority 
O process is already executing, the relevant waiting process 


is linked to the end of the priority 0 queue. 
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The processor also supports the ОССАМ сопсерр ees 
alternative inputs, which allows hardware and software 
interrupts to be programmed in a high-level language. 

The processor includes a timer: a process can read 
the time or can wait until the time reaches a value. 

The processor sees memory as a linear signed address 
space of 4 Gbytes (2?? bytes), with no difference between 
on-chip and off-chip memory except for performance. The 
signed address space allows address calculations to be 
handled in the same way as arithmetic calculations. This 
not only simplifies the processor (and compiler) design but 
also means that arithmetic overflow can be treated 
Uti worm iy 

The processor bootstraps itself either from program 
in external ROM or from any of the INMOS serial links. Any 
error detected by the processor can drive an error signal 
which can be used to stop the processor so that the error is 
contained and the cause of the error can be analysed. 

As an external memory cycle can be used either to 
access one data word or to fetch four instructions > ШОБ 
programs require more memory accesses for data than for the 
program. In general, therefore, better results are obtained 
by placing program off chip rather than placing ааа ыш 
chip. If both the program and local variables can be held on 
chip, so that most memory accesses are to on-chip RAM, the 
performance will be close to the performance of all program 
and data on chip. 

A high priority process running in the transputer 
takes priority over all low priority activity, тШ 
communication. Communication to a high priority process 


occurs concurrently with another high priority рос 


running in the transputer. For a 50 ns cycle time T424 
processor (Table 3), if all processes are running at the 
Same priority, and all the links are transferring me CE 
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Mbits/s in both directions at the same time, using internal 
memory, the maximum interference on processor performance is 
about 8 %. The average interference is negligible. 

The overall size of a program is given by the sum of 
the sizes of its program elements. All timing averages and 
the maximum program execution time are given by the sum of 
the execution times of the individual program elements. If 
the program is held in the external memory, the external 
program fetch time must be added to obtain the program 
execution time. If data is held in external. memory, the 
external data access time must also be added to obtain the 
program execution time. The processor shares memory cycles 
with its input/output interfaces. Each concurrent access by 
an interface channel delays the processor by an average of 
30 ns. The maximum reduction in performance is 10 %; under 
typical conditions the reduction is negligible. 

For integer computations, the IMS T424 transputer is 
nearer a dedicated digital signal processing device than a 
mainstream 32 bit microprocessor. However, іп common with 
other designs, it does not provide built-in floating point 
E ldblons. The design of its instruction set, including 
appropriate shifting operations, enables an efficient soft- 
ware implementation of IEEE floating point specification, 
comparable in speed with an established floating point 
coprocessor. A software library implementing both 32 and 64 
He floating point will be available from INMOS during 1985, 
and is likely to satisfy many applications needs. Using 
appropriate sequences of instructions, the T424 can perform 
arbitrary length integer arithmetic, real arithmetic, frac- 
Еола arithmetic and fixed point arithmetic. An obvious 
possibility for a new transputer product is one with 


embedded floating point capability. [Ref. 7] 


gv Links 


The IMS T424 transputer has four standard  INMOS 
Serial links which provide high speed intercommunication 
between transputer products and enable a rich variety of 
networks to be constructed. The link interfaces and the 
processor all operate concurrently and each link interface 
operating independently provides block message transfers to 
and from the memory of the transputer. 


Each autonomous link interface has an output and an 


input signal, both of which are used to carry data and 
Protocol ӘНЕС À message is transmitted as a sequence of 
bytes. After transmitting а data byte, the sending 


transputer waits until an acknowledge has been received, 
Signifying that the receiving transputer is ready to receive 
another byte, before transmitting the next byte. Each link 
implements two OCCAM channels. The protocol allows the 
receiving transputer to transmit an acknowledge as soon as 
it starts to receive a data byte and provides end-to-end 
channel synchronization. This asynchronous protocol guaran- 
tees reliable transmission in spite of possible delays in 
either the sending or receiving transputer. À message trans- 
mission via а link to or from a process executing оп the 
T424 is performed by an autonomous block transfer engine. 


The process itself is descheduled during the transfer, 


allowing the transputer processor to execute other 
processes. During transmission of a message, both serena 
and receiving processes will be set inactive, and they will 


only be linked to the end of their respective active queues 
after the final byte has been acknowledged. 
Table 3 shows different speeds of the IMS 17424 


bErpansputer: 


The links support a universal standard bit rate of 


twice the input clock frequency (10 Mbit/s with а 5 ОШ 
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TABLE 3 
SPEED OF IMS T424 


INSTRUCTION PROCESSOR PROCESSOR 
THROUGHPUT CLOCK SPEED CYCLE TIME 
MIPS 1 ns 

IPS И ns 

ns 

ns 

ns 





input clock). All transputers, of whatever word length and 
Speed selection, support the universal communications 
frequency as a product range standard. An internal link 


clock is derived from the input clock and data bits are 
transmitted synchronously with this clock. Data reception is 
asynchronous. 

As shown on Table 3, the maximum speed of the IMS 
mree transputer is 20 MHz providing 10 MIPS of throughput. 
The data rate on each link can be programmed, using link set 
configuration channel [Ref. 4]. 20 Mbits/s gives a maximum 


data rate of 1.8 MBytes/s on a channel. 
4. Peripheral Interface 


The peripheral interface is an 8 bit bidirectional 
bus which may be used to input and output sequences of 
bytes. It provides access to industry standard devices such 
as eight bit parallel controllers for auxiliary memory. A 
block message transfer capability between memory and the 
peripheral interface is provided by the interface 
controller. There are two control lines which may be used to 
address external devices, and an "Event" input to provide an 


Mtr errupt capability. 
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The "Event" input may be used to communicate with 
waiting processes and hence cause it to be scheduled. This 
provides an input functionally similar to an interrrupt, іп 
a manner consistent with the process model of the 
transputer. The typical latency for this interrrupt is 600 
ns. The "Event" input can also be used to enable the periph- 
eral interface to respond to being accessed from a standard 
microprocessor bus. j 

The interface is accessed via four standard input 
and output channels. All eight channels use the same 8 bit 
path and transfer handshake, with the processor initiating 
the transfer. The transfers are synchronized to a separate 
external clock, which need not have any fixed relationship 
with the transputer input clock. Asynchronous operation is 
also permitted, but at a lower speed than for synchronous 


operation. 


Externally addressable devices may be connected via 


the peripheral interface. For instance, by using one output 
channel as the address channel, another as the write data 
channel, апа опе input channel as the read data channel. 


Both addresses and data may be arbitrarily long sequences of 
bytes. The 4 Mbytes/s data rate provided by the interface 
allows the connection of high performance peripheral chips, 
without the need for FIFO's or DMA controllers. 
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ІІІ. 5ОЕТМАКЕ 


А. ОССАМ 


As a new programming language, OCCAM, is designed in 
conjunction with the INMOS transputer [Ref. 8]. It supports 
concurrent applications in which many parts of a system 
operate separately and interact. OCCAM can capture the hier- 
archical structure of a system by allowing an interconnected 
set of processes to be regarded from the outside as a single 
process. At any level of detail, the programmer is only 


concerned with a small and manageable set of processes. 


B. WHY OCCAM ? 


The novelty of OCCAM is in its treatment of concurrency. 
OCCAM enables the behaviour of concurrent systems to be 
explicitly programmed and controlled. It also gives’ the 
efficiency, in terms of program density and performance, of 
EN cnblessewhrlec offering the productivity and™ reli- 
ability advantages of programming in a high level language. 

OCCAM enables the programmer to express a program in 
terms of concurrent processes which communicate by sending 
messages through communication channels. This has two impor- 
tant consequences. First, it gives the program a clear and a 
simple structure as the individual processes operate largely 
independently. Second, it allows the program to exploit the 
performance of many computing components, as each concurrent 
process may be executed by.an individual processor. 

OCCAM provides а methodology for designing present and 
future concurrent systems using transputers in just the same 
way that Boolean Algebra provides а methodology for 


designing today's electronic systems from logic gates. 


The task of the system designer is eased because of the 
architectural relationship between OCCAM and the transputer. 
À program running in a transputer is formally equivalent to 
an OCCAM process, So that a network of transputers can be 


described directly as an OCCAM program. [Ref. 3] 


C. GENERAL FEATURES OF OCCAM 


L. Processes 
À process starts,. performs a sequence of actions, 
and then terminates. Each action may be an assignment, ап 


input, an output or SKIP (Table 4). An assignment changes 
the value of a variable, an input receives a value from a 
channel and an output sends a value to a channel. The 
process SKIP has no effect. The process STOP starts but 
never proceeds, its main use iS to prevent an erronous 
process from proceeding. At any time between start and 


termination a process may be ready to communicate on one or 


more of its channels. Each channel provides a one мау 
connection between two concurrent processes; ‘one of the 
processes may only output (write) to the channel, and the 


other may only input (read) from it. 

A process may be ready and waiting to input from any 
one of a number of channels. In this case, the input is read 
from the first channel which is used for output by another 
process. Communication is synchronous. The value to be 
transmitted is copied from the output process’ to the input 
process when both an input process and an output process are 
ready to communicate on the same channel. 

OCCAM may. be used to program an individual 
transputer. The transputer shares its time between the 
concurrent processes, and the channels are implemented by 


values transmitted in the main memory. 
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An OCCAM program may be executed by a network of 
transputers. Nevertheless, the same program may be executed 
unchanged by a smaller network or even by а single 
transputer. Each transputer with local storage  executes a 
process with local variables, and each connection between 


two transputers implements a channel between two processes. 


Three primitive processes, as mentioned above, are 
input, output and assignment.  OCCAM programs are built from 
these three primitives given in Table 4. They can be 


combined sequentially or concurrently to create mere complex 
processes, and so they form the building blocks for a 


program. 


TABLE 4 
PRIMITIVE PROCESSES OE OCCAM 


PRIMITIVES SYNTAX 


ASSIGNMENT variable := expression 
INPUT channel ? variable 
OUTPUT channel ! variable 





a. Assignment 


! 


An assignment is indicated by the symbol ':-' 

It transfers the value of its expression to the named vari- 
able. The expression is evaluated and the variable is set to 
the resulting value, then the assignment process terminates. 
The variable may be a simple variable or an element of a 
vector of variables selected using either byte or word 


Bubscripts. 
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An assignment ‘y := e' sets the value of the 
variable y to the value of the expression e and then termi- 
nates. For example, "у := 0° sets y to zero, and ‘у := y = 


l' increases the value of y by 1. 
b. Input 


An input process reads (receives) a value from 
the channel into a variable. The '?' symbol denotes the 
input process. This primitive reads a value from the speci- 
fied channel. It provides. synchronization with a concurrent 
process, which places a synchronizing signal on the same 
channel. An input primitive sets the value of a variable to 
a value read from a channel. The input primitive waits until 
an output primitive using the same channel iS executed in 
parallel with the input. 

An input 'c? v' reads a value from the channel 
c, and assigns it to the variable v and then terminates. An 
input 'c ? АМҮ! reads a value from the channel с, and 
discards the value. 

A multiple input is equivalent to a sequence of 
Separate input processes for each variable in turn, in left 
to right order. Each input 15 separately synchronized with 
an output process being executed in parallel. Each variable 
may be a simple variable, or a word or byte subscripbes 


element of a vector ol viriles: 
ез Output 


An output process writes (sends) the value of 
the expression to the channel. An output is indicated by the 
symbol '!'. An output waits until an input using the same 
channel is executed. It then outputs the value of the 
expression to the channel and terminates. A multiple output 
is equivalent to a sequence of outputs, which writes the 


value of each expression in turn, in left to right order. 
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Each output is separately synchronized with an input process 


executed in parallel. 
ЕШ output. c I e' writes the value of the 
! 


expression e to the channel c. Àn output 'c ! ANY' writes an 


arbitrary value to the channel c. 
"constructs 


A number of processes may be combined to form a 
Sequential, parallel, conditional, alternative, repetitive, 
EN  ubrvecconstruct- A Construct is itself a process, and 
may be used as a component of another construct. Each compo- 
nent process of a construct is written two spaces further 
from the left hand margin, to indicate that it is part of 


the construct. 
a. Sequential 


It is necessary to do a number of steps one 
after another in many applications. Figure 3.1 shows the 
flow diagram of this sequential construct. The component 


processes are executed one after another in this structure. 


Figure 3.1 Flow Diagram of the Sequential Construct 


A sequential process takes the form of the 


keyword SEQ followed by the component processes, each on a 
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new line, all at an extra level of indentation as shown in 
Table 5. 


TABLE 5 
SEQUENTIAL CONSTRUCT 


SEQ SEQ 
process l1 Gk 
process 


х“ 
eZ 


process n 





The component processes, process 1, process 2, 

process n are executed one after another. Each component 

process starts after the previous one terminates and the 

construct terminates after the last component process termi- 

nates. For example, a sample SEQ construct given in Table 5, 
reads a value, adds one to it, and then writes the result. 

SEQ and its component processes can be regarded 


as a Single process. 
b. Parallel. 


If it is required many processes to be running 
as a concurrent system, a parallel process can be 
constructed as “shown in Figurer T 2 

As seen in Table 6, the keyword PAR is followed 
by a number of component processes, each starting on a new 
line and indented. Then the effect is to execute all of the 
component processes together, which is achieved by sharing 
the processor time between the set of active processes. 

The parallel construct terminates after all the 
component processes are terminated. If there is no component 


process, the construct terminates immediately. 
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Figure 3.2 Flow Diagram of the Parallel Construct 


TABLE 6 
PARALLEL CONSTRUCT 


PAR PAR 
process 1 С | 


к 
process 2 EAM У 


process n 





For example, if we have a parallel construct as 
Seen in Table 6, two component processes are executed 
together, and are called concurrent processes. This sample 


construct allows input to x and output from y to take place 


together. 


Concurrent processes communicate using channels. 


When an input from a channel c, and an output to the same 


ЕЛ! 


channel c are executed together, communication takes place 
when both the input and the output are ready. The value is 
assigned from the writing process to the reading concurrent 
process, and execution of both concurrent processes then 
continues. 

Variables are not used for communication between 
the component processes of a parallel construct. However, a 
variable may be used in two or more component processes, 
provided that no component process changes its value by 
input or assignment. Two component processes of a parallel 
construct may communicate by sending values using a channel. 
One contains outputs to the channel, and the other contains 
the inputs from the channel. The processes are said to be 
connected by the channel. No other component of the parallel 


construct may use the same channel. 
с. mtCondltrona 


А conditional construct takes the form lo 
conditional expression followed by a process, and it is ава 
to execute if the expression evaluates to TRUE. ‘As shown in 
Table 7, a conditional construction takes the form of IF 
followed by component conditionals. The construct is able to 
execute if one of its component conditionals is able to 
execute. | 

Process l is executed if conditional expression 
Lis TRUE; otherwise process 2 is executed if conditional 
expression 2 is TRUE, and so on. Only one of the processes 
is executed and the construct then terminates. If there is 
no component able to execute, the construct terminates 
without any effect. Figure 3.3 shows the flow diagram of the 
conditional construct. A sample of this construct in GENE 


7, increases n only if the value of e 15 0. 
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Бірлге 3.3 Flow Diagram of the Conditional Construct 


TABLE 7 
CONDITIONAL CONSTRUCT 


ШЕ 


conditional expression 1 
process 

conditional expression 2 
process 





d core 


Sometimes a process has a number of channels 
associated with it and needs to perform one of a number of 
actions depending on which channel first sends it a message. 


This is achieved using the alternative construct, Figure 
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3.4, which chooses just one of its inputs for execubtlon M 
keyword ALT followed by a guarded process represents this 
construct as shown in Table 8. | 


}мР 1! 5 


P" 


Б T e 





Figure 3.4 Flow Diagram of the Alternative Construct 


è 


TABLE 8 
ALTERNATIVE CONSTRUCT 


ALT AL 


T 
guard-process l index ? ANY 
process l number :- number * 1 
guard-process 2 sum ? ANY 
process 2 SE 


out ! number 
number :- 0 
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An alternative process waits unt Tone of 
guarded processes (inputs) is ready to execute. One of the 
ready guarded processes is then selected and executed. The 
construct then terminates. À guarded-process (input) 
starting with an input from a channel is ready if an output 
process is waiting to write to the channel. If the guarded 
process is selected, the component process is executed. If a 
guard contains an expression followed by an input or wait, 
the guarded process is ready only if both the value of the 
expression is TRUE and input or wait is ready. If a guarded- 
process is itself an alternative construct, then it is ready 


if one or more component guarded processes of the alterna- 


tive construct is ready. If more than one guarded process 
becomes ready at the same time, an arbitrary one is 
selected. This may occur if the guarded processes contain 


inputs on the same channel. 

For example, a sample construct in Table 8, 
either reads a signal from the channel index and increases 
the variable number by 1, or alternatively reads from the 
channel sum, and outputs the current value of the number 


from the channel out, and resets it to zero. 
e. Repetitive 


The repetitive construct takes the form of the 
keyword WHILE followed by a conditional expression, followed 
by a single component process indented on the next line. As 
Shown іп Figure 3.5, repetition construct repeatedly 
executes the process until the value of the condition is 
FALSE. 

The component process in the repetition 


construct (Table 9) is executed as long as the expression is 


TRUE, and the construct terminates. If the conditional 
expression is initially FALSE, the process is not executed 
and the construct terminates right away. For example, a 


B» 
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"Exeure 305 Flow Diagram of the Repetitive Construct 


TABLE 9 
REPETITIVE CONSIRUCI 


WHILE conditional expresssion WHILE a 9 
process l а := 0- 


а 





repetition in Table 9, converts negative numbers to 


positive. 
Е. Replicator 


А replicator is used with a constructor ae 
replicate the component process a number of times (Table 


10). Figure 3.6 shows the flow diagram of replication. 


A replicator can be used with SEQ to provide 
conventional loop. For example, ‘SEQ i= [ O FOR iE 


causes the process to be executed n times. 
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" Figure 3.6 Flow Diagram of the Replicator Construct 


TABLE 10 
REPLICATOR CONSTRUCT 


SEQ i= [ base FOR count | PAR i = [ base FOR count | 
process process i 





Replication can be used with PAR to construct an 
array of concurrent processes. For example, 'PAR i = [ O FOR 
E constructs an array of n similar processes. The index i 
takes the values 0,1,...,n-1, in process 0, process 1 
process n-l respectively. 

Replicator construct can also be used with ALT 
for reading from an array of channels. 

The replicator declares an identifier to be the 
Applicator index such as i, giving its base value anda 
count of the number of replications required. Its effect is 


to form a sequential, parallel, alternative or conditional 


а 


construct containing count components by replicatung Ww 
component process, substituting successive integer valle 
for the replicator index (starting at base). The substituted 
value for replicator index in the last component will be 
(base * count) - 1. 

The replicator index can be used in expressions 
but not constant expressions, it may not be changed by 
assignment or input. An implementation may restrict the 
values of base and count to be constants, particularly when 
a replicator is used to form a parallel construct. ТЕ а 
count evaluates less than zero or equal to zero, then an 
empty construct is generated. This has the effect of termi- 
nation for sequential, parallel and conditional constructs, 
and the effect of never being ready to execute for alterna- 


tive processes. 
э. Declaration Lypes 


Every variable, expression and value has a type, 
which may be a primitive type ог an array type. The type 
defines the length and interpretation of values of the type. 
Table 11 shows the primitive types which are available in 


all implementations. 


TABLE [i 
DECLARATION TYPES 


Constant 
variable 
channel 
boolean 
vector 

ШЕ ерее 
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Array types are constructed from component types. 
For example, [n] T із ап аттау type constructed from n 
components of type T. A channel type is either CHAN, ог ап 
array type in which every component type is a channel type. 


For example, 'CHAN x :' declares x as a new channel. 
4. Named Processes 


À process (procedure or subroutine) may be given a 
name. For example, Table 12 shows a sample process, it 


defines the named process square. 


ТАРЕ 2 
A NAMED PROCESS 


PROC square (INT n,sqr) = 
сара” 2-22 n 





Process square is called with its name and actual 
parameters. For example, a call 'square (x,sqrx)' causes 


re ! 


Е х= хх 


5. Expressions 


An expression is constructed from operators,  vari- 
ables, numbers, the truth values TRUE and FALSE and the 
brackets (and). 

The boolean operators AND, OR and NOT operate on 


boolean values and yield boolean results. 


The arithmetic operators +,-,*,/ and \ yield the 
arithmetic sum, difference, product, quotient and remainder 
respectively. Both operands must be of the same integer or 
real type. 
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The орегасотв 77727) andi operate on integers and 
yield the bitwise and, or, | and exclusive or operations of 
the operands respectively. 

The relational operators yield a result of type 


boolean, and both operands must be of the same type. The 


relational operators = and <> operate on any primitive type, 
and represent equals and not equals. The operators >,<,>=2 
and <= operate on integers and reals, and represent greater 


than, less than, greater than or equals to, and less than or 
equals to. 

Type conversion may be performed by using one of the 
type conversion operators $, SROUND, and STRUNC. 

A string is represented as a sequence of ASCII char- 
acters enclosed in double quotation marks ". If the string 


has n characters, then it is an array of type [n] BYTE. 


6. Configuration 


Configuration is simply the allocation of processmm 
resources to concurrent processes in a program. It is uses 
to meet speed and response requirements by distributing 
programs over separate, interconnected computers, and by 
placing and prioritizing processes on single computers. 
Configuration does not affect the logical behaviour of a 
program. Simple implementations may omit or ignore some or 
all the configuration facilities: However, it does enable 
the program to be arranged to ensure that performance 
requirements are met. 

Every computer has a local memory and a set of 
numbered ports. A physical connection between two computers 
connects a port on one computer to а port on the other 
computer. This implements up to two channels between the 


computers, one in each direction. 
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а. Prioritized Parallel 


Крал ет construct may be configured for a 
Single transputer. The transputer shares its time between 
the component processes, and the channels are implemented by 
values in store. Therefore, OCCAM also contains the priori- 
tized parallel construct declared as PRIPAR in addition to 
the regular parallel construct. This construct provides a 
different priority for each component process. Each compo- 
nent process of a PRIPAR construct 1S executed at a separate 
Betority. The first process has the highest priority, the 
last the lowest. If P and Q are two concurrent processes 
pho priorities p апа а such that р<4, then Q is only 
allowed to proceed when P cannot proceed. An implementation 
may restrict the number of components which a prioritized 
parallel construct can have. And also an alternative 
construct can be used to provide the prioritized input 
primitives. 

On any individual transputer, the outermost 
uel] ceonstruct may be configured to prioritize its 
components. А prioritized parallel (PRIPAR) construct 
ensures that a higher priority process always proceeds in 
preference to a lower priority one. The progress of a higher 
priority process is not affected by any lower priority one, 
except by communication on connecting channels. If several 
concurrent processes at the same priority are able to 
proceed, each one iS given an opportunity to proceed in 
turn. The T424 transputer supports two levels of priority, 


ОООО ОТУ О (high priority) and priority 1 (low priority). 
o Placed Parallel 


А parallel construct may be configured for a 
network of transputers by using the PLACED PAR construct. 


Each component process (termed a placement) is executed by a 
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Separate transputer. Port allocations are used to allocate 
channels to ports. The variables used in a placement must be 
declared within the placement. The values of the timer on 
different transputers are unrelated. А parallel construct 


configured for a network may be reconfigured for an indi- 


vidual computer. 
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ULTI PROCESSOR MULCTITRANSPUTER SYSTEM 


А "риге" multiprocessor contains two or more processors 
of approximately comparable capabilities, which share access 
wn all of memory and all input-output (1/0) channels, 
control units, and peripheral devices. The entire system is 
controlled by a single operating system. More frequently, 
multiprocessor systems share only portions of memory and 
input-output channels. The processors have dedicated memo- 
ries for storing programs and local data and share data in 
segments of common memory. 


Multiprocessor systems usually make it easier for the 


user to access the system, they generally provide increased 
performance through resource sharing, and they often 
increase the availability of a system. Multiprocessing 


systems can provide adaptability and rapid reconfiguration 
with the system functioning at different times as a very 
large and complex problem solver or as a network of smaller 
machines each dedicated to a unique task, or as something in 
between. A network of microprócessors can quite often dupli- 
cate the capability of one large expensive system at lower 
cost. They can also provide increased reliability since the 
total system can continue to operate despite individual 
processor failures, albeit with reduced capabilities, 
provided that some of the links between the processors 
remain intact. Also, since redundancy can be achieved at a 
lower cost using processors distributed over a large area, 
the survivability of the system, particularly in military 
applications can be increased. Furthermore, a distributed 
processing system can provide increased, distributed power 
and responsiveness because it can be closely tailored to the 


application. Additional multiprocessor systems can be 
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provided as needed, to ensure proper response time. 
Multiprocessor systems can also be designed to be cost 
effective when applied to a wide variety of applications, 
where the number of processors can be determined by the 
distributed processing requirements. A properly designed 
distributed processing system threatened by overload can be 
incrementally expanded by simply adding more processors, 
Because of the above advantages, a large number of 
applications of multiprocessing systems can be Seen such as 
control of electric power. generation, distribution, and 
consumption, nuclear power processing facilities safeguard 
c control, health care delivery in hospitals and medical 
centers, climate control, security, waste disposal, many 
fire protection in large buildings, and in defense systems. 
The disadvantages may or may not outweigh the  advan- 
tages, depending on the system-unique requirements. On the 
minus side, the designer may Бе faced with increased soft- 
ware complexity. Application software may be more costly to 
develop for a distributed rather than a centralized system. 
In contrast to a single central processor based system with 
only one executive, a distributed system typically requires 
each processor to contain its own individual executive that 
must be capable of communicating with all the other execu- 
tives in the total system. This, in turn, will геайтге 5 
each individual executive provide a task handling capability 
where the tasks resident in various processors can communi- 
cate with each other, and, in case of local software or 
hardware errors, diagnostic capabilities exist to localize 
"bugs". This is not to say that diagnostic or error ches s 
software is not needed or used in large centralized, single 
processor systems; however, the diagnostic software develop- 
ment for a distributed systems is usually more difficult and 


costly. 
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A distributed processing system is also more dependent 
on communication technology, particularly where the 
computers are widely dispersed and the peak traffic demands 
between the computers are high. The design and development 
of a unique distributed processor system may require exper- 
tise both in hardware and software areas. The advantages and 
disadvantages of the distributed multiprocessing systems are 
given in the Table 13. [Ref. 2] 


TABLE 13 
MULTIPROCESSOR SYSTEMS ADVANTAGES&DISADVANTAGES 


ADVANTAGES DISADVANTAGES 


Increased reliability Increased H/S complexity 
Increased survivability Difficult system testing 
Increased processing power Hard failure diagnosis 
Increased responsiveness More communications 
Increased modularity Depend on =: 
System expandability Unique expertise neede 





Pee MULT ITRANSPUTER CONCEPT 


The system performance has increased regularly by a 
factor of ten each decade in the past as seen in Figure 4.1. 

This -improvement has been achieved by advances’ in 
circuit technology and by increasingly complex systems. VLSI 
(Very Large Scale Integration) technology offers the poten- 
БЕСИ сЕ much greater circuit complexity for the future but 
only modest increases in circuit performance. 

The economics of uniprocessor systems are based on the 


historical perspective that processing is expensive in 
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Figure 4.1 Throughputs of the Developing Technology 


comparison with the memory. This has led to the Von Neuman 
bottleneck problem where a single processor iS connected to 
vast amounts of memory. The economics of the VLSI are 
different. Now, a Single wafer of silicon can contain 2 
megabytes of memory or 256 conventional microprocessors. To 
exploit this potential, it will be necessary to build 
Systems with a much higher degree of concurrency than is 
currenbly possibile. 

The transputer is designed as a programmable component 
for multiprocessing tasks to implement such systems. System 
architecture is optimized to execute OCCAM, a concurrent 
programming language. This software sees the system as a 
collection of concurrent processes that communicate with 
each other and with peripherals through channels. The same 
OCCAM program a transputer network executes can тїп 
unchanged by a smaller network or a single transputer. The 


sending transputer transmits messages aS a sequence of 


^6 


bytes, then awaits an acknowledgement. This signifies that 
the receiving  transputer is ready to accept another byte. 
Transmission is continuous because the receiving transputer 


acknowledges as soon as it starts to receive a data byte. 


Moreover, this asynchronous protocol guarantees reliable 
transmission despite sending or receiving delays. [Ref. 9] 
с соты р Еш пе арапезе, ап intelligent interaction 


between people and computers can be achieved with computers 
which perform a thousand times faster than present day 
systems. This will only be possible using concurrency, and 
the transputer has been designed to make such fifth genera- 
(ШОП systems a possibility. fRef. 10] 

Transputers have special instructions to schedule 
concurrent processes and to provide communications between 


them. Тһе transputer does 5 or more MIPS (Mega Instructions 


Per Second) even when not used in parallel. Hardware 
supports the parallel-processing language, while memory- 
intensive architecture speeds execution. [Ref. 11] 


À concurrent system is first and foremost а multipro- 


cessor system. The term 'concurrent' is also used in the 
context of multi-tasking systems; such systems are better 
described as  pseudo-concurrent. Concurrent systems are 


likely to be no easier to design and implement than non- 
concurrent systems. Even in such sequential systems, the 
advantage of design and implementation using a high level 
language rather than machine-level programming is well 
recognized; it seems therefore that machines for concurrent 
Systems should encompass the abilities of excellent high 
level language machines on top of any unique concurrency 
aspects. [Ref. 12] 

Arrays, pipelines and loops of transputers can be used 
to provide greatly increased performance by exploiting the 
concurrency inherent in many applications. Two examples 


which require high performance are’ signal processing апа 
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database searching. Networks of transputers can provide the 
performance needed for both applications. Signal processing, 
such as the Fast Fourier Transform algorithm, maps easily 
onto a pipeline. The pipeline can accept the input samples 
at up to 100 KHz., which more than covers the  fulloamc 
Spectrum. А 64 point FFT requires six transputers in the 


pipeline, а 256 point FFT requires eight and 1024 рота E 


requires ten transputers. A pair of pipelines, interlinked 
at each stage, is able to accept input samples at up to 200 
KHz. Higher frequencies can be handled by using more 


transputers in parallel: [Ref. 6] 

An array or a pipeline can also be used to do searching. 
Provided that ‘the search requests can diffuse through the 
network and the answers converge, the shape of the network 
does not matter; it can even contain faulty devices. The 
full internal memory of each transputer can be searched 1000 
times per second. With external memory attached to each 
transputer, the search rate is slower, but 64 Kbytes per 


transputer can be searched at least 30 times per second. 


If we look at other applications, such as image 
processing, finite element analysis, matrix manipulation, 
telephone switching systems, fault tolerant systems апа 


artificial intelligence naturally lend themselves to arrays, 


loops or networks of transputers. 


В. MULTIPROCESSOR INTERCONNECT STRUCTURES 


The traditional approaches to interconnect computers is 
based on the use of either serial or parallel links. For 
tightly coupled systems (shared memory) where maximum 
distances between transmitters and receivers are in the tens 
of meters range, parallel cables are typically used with 
8,16, or 32 bits for data and an equal or perhaps larger 


number of bits for parity check and Сәбе ЕЕЕ 
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Several basic design alternatives аге available for 
developing a multiprocessor interconnect structure [Ref. 2]. 
Information between computers can be transferred from source 
to destination either directly or indirectly. Опе ог тоге 
Switching entities may be employed if an indirect transfer 
strategy is employed. This intervening switching entity may 
perform and address transformation or route the message onto 
one of a number of alternative output paths. Examples of 
Systems based on indirect transfer are loops, buses, or star 
configurations, or packet-switched Systems. The major 
difference between direct and indirect transfer strategy 
lies in the distribution of message transfer "intelligence". 
Indirect transfer methods require more complex  communica- 
tions capability but also increase the fault tolerance of a 
System. Indirect transfer methods are based оп either 
centralized or decentralized routing of messages. 

Another design alternative exists in terms of selecting 
the message transfer path between computers. It may be 
dedicated as in the case of the loop, star, or completely 
interconnected system or shared as in the case of bus, 
packet-switched, or shared memory systems. It may also be a 
combination of both as in hierarchical systems, where the 
computer at the top of the pyramid receives messages from 
several computers, whereas computers at the bottom of the 
hierarchy have a single path to the computer "above" it. | 

Generally, а system based on a dedicated path structure 
is more fault tolerant than a system using shared paths. If 
a path that is accessible from more than two points fails, 
no alternative way exists to transfer data between computers 
in the system. However, systems with redundant paths can be 
used to minimize the effects of single-point failures on the 
total system. 

Cost of various interconnect schemes depends on whether 


a system can be developed using off-the-shelf hardware 
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and/or software or whether the design must be performed 
"from ground up". Cost is also related to them mnumbe s 
processors to be used in the system, the amount of memory 
required in each node of the system, and the bandwidth of 
communications links between computers. The cost Ton 
completely interconnected systems tends to be high compared 
with loop-and bus-based systems. Throughput capacity 15650 
much a function of interconnect structure as it 15 ОГ ИЫ 
technology. Use of a twisted pair of wires limits data rates 
to a few megabits per second over a distance of a few thou- 
sand feet. Even at this distance, problems are encountered 
with too many drops if we are dealing wicho Ы 
system. Тһе bandwidth can, however, be increased using 
parallel lines, coaxial cable, or fiber optics links: TEESE 
of course, possible to use dial-up or leased telephone 
lines, but that limits the maximum bandwidth typically to a 
range of 4800 to 50,000 bit/s. Higher bandwidths are also 
possible with the use of microwave or satellite links, but 
this will today have a profound impact on total system cost. 
Ret 2] 

Some generalized observations can be made regarding each 


type of architecture in the areas of cost; modularity, flex- 


ОИЕ reliability, ay at la Da Еу fault tolerance; 
performance, throughput; ease of development, "off-the- 
shelfness"; and form factor (design attributes). Depending 


on how the various design attributes are weighted (which is 
application dependent), one or more desirable interconnect 
structures can be selected. The various types of groups of 
interconnect technology methods ares Complete 
Interconnection, Packet Switched Network, Regular Network, 
Irregular Network, Hierarchy, оор өр Кай Global Bus, 
Star, Loop with Switch, Bus Window, Bus with с Ес ы 
Shared Memory. 
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The choice among the architectures is determined by the 
number of sensors in the system, their physical location, 
and the amount of data collected by the sensors for trans- 
mission to the system processors. Where the number of 
sensors is extremely large and data rates are very high. 
The vulnerability to communications path failures can be 
mitigated using redundant paths. The problems inherent in 
any architecture are interprogram communication and  data- 
base considerations, potential deadlock, and error recovery. 
[Ref. 2] 

Loop technology will be explained in details in the next 


e 


section. 


C. LOOP COMMUNICATION SYSTEM 


A Loop multiprocessor system can be defined as a system 


which consists of a high-speed, unidirectional, digital 
communication channel (e.g., twisted-wire pair or a fiber 
@ptics link) which is arranged as a closed loop or ring. 
Enes such as mini or microcomputers, terminals, or регірһ- 


erals can be attached to the loop channel by a hardware 
device known as a loop or ring interface.  [Ref. 2] 


The growing requirements of local communication systems, 


such as in-house telephone systems, and the introduction of 
new facilities (alarms, controls, data services, paging) 
have led to a search for new network concepts. Rather than 


having several special-purpose networks it would be desir- 


able to provide one single universal system for transmitting 


ExcESwritchrng of all types of information. Such an inte- 
grated system, however, Should not depend on complex and 
expenSive central switching equipment. An example ОЕ а loop 


with the various subscriber stations connected at arbitrary 


points is shown in Figure 4.2! the signals are transmitted 


' Reproduced by permission. 
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in one direction only. Digital source signals (or sampled 
and quantized analog signals) offered by the subscribers are 
parceled and transmitted in blocks to their destination. 
Assignment of message blocks to subscriber stations is done 
by using address coding. [Ref. 13] 


| Termina! 
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Figure 4.2 Loop Configuration 


То send a message from one node to any other, the 
message is entered on the ring. It will then travel around 
the ring until it either reaches the node addressed “8 
returns to the transmitting node. In some systems, the orig- 
inating node removes the message, whereas in others the 
destination node removes it. In the former case, the origi- 
nating node can compare the original message with the 
message which has circulated around the loop, thereby also 
performing an error check on it. A bit is usually set in a 
predetermined bit position in the message by the destination 
node, to signal the transmitting node of message receipt. In 
the latter case, the destination node removes the message 
and usually performs the error checking on the message. This 


approach obviously also reduces the traffic load on the 
loop. 
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ору сет, message transmission takes “place in 
the form of address blocks of data called frames or slots. 
The local loop interface forms a frame, giving the address 
of the destination interface, and transmits the frame onto 
the loop. Each loop interface downstream of the transmitter 
receives this frame, checks it destination address, and 
immediately retransmits it back onto the loop if the proper 
destination for the frame has not been reached. When a 
receiving loop interface recognizes its own address as the 
destination of an incoming. frame, it removes the frame from 
the loop and delivers the message to the local attached 
minicomputer. Digital transmission on the ring is time 
division multiplexed. The channel capacity of the ring is 
multiplexed into a series of time slots. 

Loop architecture, especially applied to data acquisi- 
tion, 15 attractive from a modular point of view. A sensor 
may be placed anywhere on the loop with a simple interface 
and may, if necessary, communicate with any node connected 
to the loop. Since messages are passed from node to node 
successively, the failure of a single node or path between 
two nodes can bring down the entire ring. Thus, for unidi- 
rectional, active repeater loops, the failure effect and 
failure reconfiguration attributes are poor. Loop systems 
are, however, available to improve system fault tolerance 
(with passive repeaters or bypass relays). 

For two-way communications, two channels must be used, 


but in loop type architectures a single channel is suffi- 


cient for communications. This system will obviously fail 
if any one of the links or nodes fail. The reliability of 
the loop can, however, be improved using the coupler. Other 


more reliable loop architectures can also be implemented 
Meane fiber optics. [Ref. 2] 
The following section will present different types of 


oops. 
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D. 1 РЕЗ ОБО 


Categorization of the loop configurations is based on 
the type of message-transmission mechanism employed. There 
are three main types which will be introduced in the 
following subsections: the Newhall-type, Pierce-type, and 


Delay Insertion type. [Ref. 2] 
1. Newhall Type Loop 


A control token or character is passed around the 
loop in a round-robin fashion, from loop interface to loop 
interface. The interface currently in possession of a token 
is allowed to transmit messages of arbitrary length onto the 
loop; the other interfaces are allowed only to receive 
during this time: The control token is passed to the next 
node downstream allowing the node to transmit when a trans- 
mission is completed. Only one transmitter can be active at 
any one time, therefore an interface will never experience 
interference during the transmission of a message. An inter- 
face must always wait for the control token to be passed to 
it, even when it is ready to transmit a message. A Newhall 
loop provides for variable length message transmission, but 
it does not allow concurrent use of the loop channel by two 


Or More transmitter interfices? 


2. Pierce Type Loop 


The communication space on the loop is divided into 
an integer number of fixed-size slots into which message 
packets can be stored. It might be considered of as a 
circular track, with box cars end to end, where some may be 
full and others are empty. The control of a Pierce type loop 
is centralized. Each slot contains a but that "nde ы 
whether it is filled with a packet or empty. So, alia 


transmitter needs to do is to divide each message into 
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packets, to wait for empty slots to pass by, and when this 


occurs, to fill them with packets. Because of the slots are 
fixed size, user messages are blocked into fixed-size 
packets, prior to being multiplexed onto the line. Various 


multiplexing techniques can be used. 


3. Delay Insertion Type Loop 


This loop is superior in overall performance. Each 
ring interface has a complete set of control capabilities. 
Delay Insertion Loop has been chosen for our implementation 


and it will be explained in detail in the next chapter. 


E. LOOP ANALYSIS 


ime way сор? 


A very attractive configuration in multiprocessor 
Systems is а loop, because of its remarkable advantages 
[Ref. 2]. These advantages are: 

a) Only one path for the message to follow in reaching 
its destination; no message routing problem in system. 

B) No transmitter needs to know the location of its 
receiver. 

C) Broadcast message transmission is so easy to achieve, 
therefore every node can pick up the message. 

d) Connections can be established very quickly and easily 
(important for traffic with short message duration);in 
several multiprocessor systems based on intermittent 
inquiry-response nature, messages are usually so short 
Ее verification, electronic fund transfer, 
goods.ordering, information retrieval applications). | 

W Digital data transmission eliminates the need for 
modems and data conversion. 

f) Low initial capital investment , in loop configurations 
(east proportional to number of users of interfaces). 


EE oop configuration provides a very high throughput 


E 


(nodal interfaces and not processors are used to rem 
messages; more messages can Бе іп transmission at the 
same time). 

h) Loops are easy to implement with distributed switching 
mechanisms without the need for any sophisticated 
common control (because each loop interface can 
provide its own bus arbitration and synchronization). 

The primary advantage of a loop system is its rela- 
tively low cost and high modularity. Node processor failures 
can be masked by load sharing among the remaining processors 
bos task addresses are kept in tables in each node апа 
checked by the communications software in each computer. The 
loop approach is particularly attractive when wide-band 
coaxial or fiber optics buses are used to interconnect the 
nodes. 

A loop is very vulnerable to failures of the inter- 
faces because of its serial organization. Reliability is one 
major disadvantage, but it can be increased with different 


methods. 
2. Performance of Loop 


Some Simulation studies on the Distributed Loop 
Computer Network (DLCN) showed that for low channel utiliza- 
tion the performance of the Newhall loop closely approaches 
that of the DLCN Delay Insertion Loop. As the traffic level 
increases, the comparative attractiveness of the Newhall 
loop diminishes. 

The Pierce loop approach is less attractive than 
either the Delay Insertion Loop or the Newhall Loop, at low 
levels of line utilization, because a message always has a 
mean wait time of half a packet interval and must’ then be 
transmitted in several packets. At higher traffic levels, 
the performance of the Pierce loop is better than that of 


the Newhall loop because of the packet mechanism permits for 
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two or more concurrently active transmitters. in ee eres 
loop presents optimal performance characteristics if the 
message is the same size as the packet. Generally, this does 
not occur in а real multiprocessor system environment. 

A Delay Insertion Loop is more efficient than either 
of the other two types since every message is composed of 
message units and the main advantage is that short messages 
have a short delay time even under heavy loads. The average 
transmission time on a loop is independent of traffic load 
for Pierce and Newhall loops. But, the mean transmission 
time increases significantly with higher traffic loads in 
the delay insertion loop. The Delay Insertion Loop message 
transmission technique is superior where queuing delays for 


messages entering the loop are short. 


PPeluwa3brlirty of Loop 


Vulnerability to errors is the major drawback of a 
Boop. Transmission errors can affect the proper functioning 
of loop organization. А distortion of the receiver address 
will either result in a packet being delivered to the wrong 
destination or, if a mutilated address is not being 


" 


"handled" by the system, a "lost packet will keep circu- 
lating around the loop. Several message-transmission schemes 
for loop-based systems use a central monitor to check the 
loop and remove packets that have circled the loop more than 
once without being received by any of the nodes. Schemes 
exist where each interface acts as a loop monitor. 

Loop interface failures can cause either a loss of 
access to the loop or a breakdown in loop operation because 
of the serial nature of a loop. This problem can be solved 
Lr electrical relay circuitry and also ап additional 
protection can be provided by opto-isolators. 

The reliability of loop сөпігірпигаріопс сап be 


increased by providing a standby loop that parallels the 
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main loop. There are two ways, By-Pass апа зетг-Неат а 


which a standby loop can be used in multiprocessor system 


design. 

In the By-Pass technique, traffic can be routed 
around any number of malfunctioning interfaces, thereby 
maintaining the connectivity of the loop. The major short- 


coming of this technique is the effect of a failure ona 
тесопГ1ршгтас1оп ШЕ. 

By using Self-Heal technique based on bidirectional 
double-loop Structure, . complete connectivity can be 
maintained when any number of adjacent terminals or reco- 
nfiguration units fail. When two nonadjacent nodes or reco- 
nfiguration units fail, the sections of the loop on erti 
Side of the failure are isolated. This method is highly 
reliable where a limited number of devices are attached to 
the loop. 

New multiplex systems, redundant communication 
loops, hierarchical multiloop systems, increasing the stages 
in a multi-stage loop, and new switch configurations can 


also increase reliability of the loop. 


Е. SYSTEM CONFIGURATIONS WITH THE TRANSPUTER 


As seen in Chapter II, the T424 transputer provides four 
communication channels to use in a system configuration. So, 
possible system configurations for the transputer may be one 


of the following structures. 
І. MatrixSotCructuüre 


Each computing element in this two-dimensional 
structure is connected to each other with one сһаппе ТГ 
square matrix is desired for symmetrical structure, the 
number of computing elements will be 4, 9, 16, 25, 36 and so 


on). It provides a very large number of communication тіп” 
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and multiple communication paths but there is no further 


channel redundancy between two computing elements (Figure 
4.3). 





Figure 4.3 Matrix SEPUCEHIe 


DX Tetragonal 3-D Structure 


ties structure, each computing element is 
connected to three elements and they build a new computing 
group which still has four available communication channels 


for other computing group connections (Figure 4.4). 


5: Eoop/Ring Structure 


Each computing element in enisi structure is 
connected to its two neighbors with two channels each. ТЕ 
provides redundancy for communication channels between two 


computing elements (Figure 4.5). 
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Figure 4.4 Tetragonal 3-D Structure 





Figure 4.5 Loop/Ring Structure 


4. Butterfly Structure 


As a special implementation of a ring structure, a 
butterfly structure is a good solution for the вазе вова 


Transformation or similar engineering applications (Figure 
4.6). 
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Figure 4.6 Butterfly Structure 


5. The Other Structures 


There is no limit to the size, function, or shape of 
a network of multitransputer systems. The transputers can be 
thought of as building blocks like bricks since they can be 
built into systems of arbitrary size, function, or shape. 
Therefore, the system overall performance is a function of 
the number of transputers. [Ref. 10] 


The other possible structures can be a functionally 


distributed network (Figure 4.7), a toroidally connected 
array (Figure 4.8), a complete loop regular array (Figure 
EE), and a bigger  transputer built from oUm DIS 


transputers (from Tetragonal 3-D Structure) (Figure 4.10). 


oL 





Figure 4.7 A Random Network 





Figure 4.8 A Toroidal Array 
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A Complete Loop Regular Array 


Figure 4.9 





A Bigger System Built from Four Big Ones 


Figure 4.10 


V. DELAY INSERTION TYPE L@OP COMMUNICATION SYSTEM 

The Delay Insertion Type Loop has a great importance in 
the concept of a local loop communication systems with 
distributed control. Various performance studies have shown 
that its overall performance is ‘Superior to Newhall апа 


Pierce loop types. [Ref. 2] 


А. INTRODUCTION 


The delay-insertion technique was simultaneously devel- 
oped by E.R.Hafner, Z.Nenedal, M.Tschanz for telephone 
Switching purposes and by researchers at Ohio State 
University for the Distributed Loop Computer Network (DLCN) 
System. The main difference between the two is that variable 
rather than fixed messages are used in the DLCN scheme. The 
general operation of Delay Insertion Type Loops is described 
utilizing the loop communication system developed by Haines 
He refers to this scheme as "loop extension strategy”. 

Digital signals offered by the subscribers are parceled 
and transmitted in blocks to their destination using 
address-coding. Access of the terminals to the loop 15 
gained by switching а delay network (shift register, 
containing the message block to be transmitted) into the 
loop line. This strategy guarantees that every station is 
able to transmit a message at any time regardless of the 
traffic conditions in the loop. A laboratory model {orem 
frequencies of about 10 megabits/sec is presently being 
built. It allows data and telephone connections including 
such features as call back, call transfer and forwarding 
loop transmission to be realized by means of а three-core 


symmetrical cable which carries the data sequences and the 
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timing information separately. This allows the use of an 
extremely simple regenerator. Possible applications include 
integrated in-house communication systems and control and 
Supervisory Systems for manufacturing processes, railway 


Stations and trains.  [Ref. 2] 


B. LOOP ORGANIZATION AND OPERATION 


Ф 


Each subscriber station is equipped for extracting and 
wr rodúcing information from or into the loop. It also 


contains all the logic functions necessary to control these 


connections; there is no central exchange. Depending on the 
Bask of a station, its structure may be more or less 
complex, e.g., a receive-only station for one single command 


will need nothing but a decoder for a fixed address and the 
coupling element to the line. The basic diagram of a more 
general station that can transmit information as well as 


receive is shown in Figure 5.1.7 [Ref. 13] 


(SHIFT REGISTER) 





Figure 5.1 Bastemmunction of Delay Insertion Loop 

It contains a delay element, preferably а shift 
register, which can be switched into the loop line, anda 
receiver that is permanently connected to the line. The 


^Reproduced by permission. 


65 


delay of the shift register is equal to the length of the 
message block to be transmitted by this particular station. 
In the idle state (passive source) the shift register is 
shunted and the incoming bits or blocks pass onto the 
outgoing line without significant delay. If a message is to 
be transmitted, however, the delay element is switched into 
the line which results in an extension of the loop. The gap 
generated in the bit stream permits insertion of the message 
block. This is done automatically by assembling the message 
packet in the shift register before transmission. The 
message travels along the loop and is received Бу every 
Subscriber station. It will only be processed, however, by 
the station that matches the identification label (addres pE 
In the simplest case the message block is taken out of 
circulation after one entire run at the transmitting station 
by disconnecting the shift register. This also cancels the 
loop line extension. [Ref. 13] 

For the practical implementation of this basic func memg 
the slightly modified arrangement is used as shown in Figure 
5.2.7? Especially, we will concentrate on this implementation 
in this thesis. 

Each ring interface has a complete set of control capa- 
bilities. The detailed diagram shown in Figure 5.3* illus- 
trates the basic principle of ring interface operation. 

The Receiving Shift Register (RSR) is permanently 
connected to the incoming line and performs both the 
receiving and block dropping (removing messages from the 
loop). There is a second shift register (TSR) for transmit- 
ting, i.e., preparing of the message blocks for insertion in 
the loop from the node processor. Sending and receiving are 


controlled by a switch (SW) with three positions connecting 


` Reproduced by permission. 
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Figure 5.3 Delay Insertion Loop and its Operation 


Eher the output of the TSR, the RSR or the incoming loop 


line to the outgoing line. 
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The sequence of events is as follows: When a message is 
to be transmitted, the three-position switch(SW); "imt 
im Position l, opens (or cuts) the loop for a well-defined 
time interval. Since the flow of incoming message blocks 


(message stream) cannot be stopped without loss of informa- 


tion, the bits arriving during transmission in the RSR have 
Со be stored. Therefore, the arriving bits in the RSR are 
temporarily buffered and forwarded afterwards. Immediately 


after a méssage inserted by the subscriber (node) has left 
the TSR, which must be of equal length buffer as the RSR, 
the switch goes to Position 3 and the output 15 delayed by 
one message length as shown in Figure 5.3. The node has now 
entered the loop. It can be called "active" if the switch is 
in. Position 2; as opposed to "passive", if its switem 
remains о Position T. 

The process of transmitting a block can be initiated 
only from the passive state. A second message cannot be 
transmitted before the node has become passive, i.e., left 
the loop. In other words, for transmitting a second message, 
the interface must be switched into a passive state (i.e., 
the node has been disconnected from the loop). This is done 
by setting the switch from Position 3 to 1, which takes the 
block in the RSR out of circulation (prevents the block in 
the RSR from circulating in the loop). The Жітес рас та 
following block апа the last bit of the preceding one are 
joined together without leaving a gap. It is clear thak 
switching has to be synchronized in all phases in order to 
prevent block damaging. Nevertheless every station (node) 15 
entirely autonomous, as it freely determines the moment of 
transmitting a message (no polling). For leaving the loop, 
the station must be authorized to take off a particular 
block. The decision rule is very simple if every station 
only cancels (removes) its own messages that have circled 


the loop (Message X in Figure 5.3 is removed before the 
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ЕС ЕСІП 1-5 returned from Position 3 to 1). This yields a 
one-to-one correspondence between stations and circulating 
blocks and allows a free choice of the individual block 
Size. For every new block the entire cycle is repeated as 
described. The block may contain data, signaling or supervi- 
sory information but normally no transmission occurs during 
idle periods, as opposed to circuit switching systems where 
channel capacity is reserved even during transmission 
mauses.  [Ref. 13] | 

Suitable monitoring is of great importance in such a 
decentralized system. Consider the case where an error is 
Ша in the address so that neither transmitter nor 
receiver will recognize its block. If there is no action 
taken, this block will circulate on forever and finally 
congest the loop together with other mutilated messages. 
Meanwhile the originator cannot leave the loop because there 
is no block he is authorized to remove. This problem can be 
served by introducing a monitoring station into the loop at 
EHnCcarbitrary point. In addition to checking single blocks, 
this station also supervises the entire loop operation and 
provides clocking and synchronization of the whole system. 
This monitoring station is a departure from the idea of 
completely distributed control. Certain specialized func- 
tions are more economically performed by common equipment 
which is at the disposal of all the stations. Examples for 
telephony аге conference call facilities and abbreviated 
айтпесе. [Ref. 13] 


C. IMPLEMENTATION OF DELAY INSERTION LOOP 


1. States of Delay Insertion Loop 


We can think of the Delay Insertion Loop Interface 
as a Finite State Machine with three states corresponding to 


the three position switch, Figure 5.4. 
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Figure 5.4 States of the Delay Insertion Loop 


In State l (passive state), the processor listens to 
inputs from a channel or channels. When no messages come in, 
then the listening process waits for а well defined time 
period, and if no messages come in during this period, it 1s 
reported as malfunctioning and a new channel is selected. If 
something does come in, it is checked to see if the message 
is intended for this processo.. ТЕРЕ Я the processor 
copies and acknowledges the message and passes it on to the 
next processor. If the message is not for this processor, it 
is passed without acknowledging and copying. 

During the State 1, if a transmission request from 
user has been generated then a transition to State 2 will be 
made and the output channel will transmit the message and 
passes to State 3 immediately after transmission. During the 
transmission, if there is an incoming message, it is stored. 
In other words, the receiving process continues. 

In State 3, a watchdog timer is set and all incoming 
messages are checked for originating processor number. If 


the number is not the same, it means the message is for some 
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other processor, therefore the message is passed on to the 


next processor. Otherwise the message is not forwarded and 


it is checked for acknowledgements. If the acknowledge is 
not found, an error is reported and the system returns to 
State 1. If the acknowledge is found, "own" message is 


removed from the loop, and the system transitions to State l 


and continues its operation. 


2. Four Transputer Loop System 


The transputer provides four communication channels 
in a system configuration. Its channel availability and the 
reliability desire lead us to choose the Two-Loop (ring) or 
Single-Loop structure because of simplicity for the imple- 
mentation. In spite of the fact that the system can be 
affected by loop failure, we will concentrate on the Single 
Unidirectional Loop (Figure 5.5) because of its less complex 
implementation. 

Now let us take a Single Unidirectional Loop with 
four transputers as a system, shown in Figure 5.5, and let 
us apply the Delay Insertion Loop methodology based on the 


Finite State Machine idea. 





Figure 5.5 Four Transputer Single-Unidirectional Loop 
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As shown in Figure 5.5, there are four transputers 
in the system, TR.1, TR.2, TR.3 and TR.4. They are connected 
with one link (channel) to each other; linkt, link? Time 
or link4á. Initially every transputer is in State 1. At any 
instant (say tO), if TR.1 wishes to transmit a message to 
TR.3, then TR.1 goes to State 2, and the output Chama 
linkl will transmit the message and transition to State 3. 
TR.2 being in State 1, is listening for messages and the 
message is received. It is not intended for TR.2, so it will 
retransmit (just passes) the message through the link2 and 
remain in State 1. TR.3 is also in State 1, listening to the 
input channel link2 for the message. It recognizes the 
message as intended for it, copies, acknowledges and passes 
it on through link3 staying in State 1. ТЕ.6 іб аТтба ш 
State 1, receives and retransmits (passes) the message via 
link4 and remains in State 1. Finally, after one complete 
circulation, TR.l receives the acknowledged message and does 
not pass it on and transitions to Statens 

All transputers can also transmit a message at the 


same time being in State 2. Every transputer is initially in 


State 1. Each transputer has an individual transmission 
request. Figure 5.6 illustrates the multiple transmission 
request situation, where horizontal timelines describe what 


is taking place in each of the four transputers and the 
adjacent lines describe the activity on the connected links 
(link 1 of computer 1, L1.1 to link 3 of computer 2 E 


The link operation is independent of the processor 
operation, but the communicating links аге completely 
synchronized in their operation. As shown in Figure DEED 


TR.1 wants to transmit a message (A) to ТЕ. З,  TR.2 маша 
transmit a message (B) to TR.4, TR.3 wants to transmit a 
message (C) to TR.1, and TR.4 wants to transmit a message 
(D) to TR.2 simultaneously, thus each transputer transitions 


to State 2. After the transmission, they transition to Stare 
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3 for simultaneous listening and receiving their own 
messages. All the operations occur at the same time аз 
explained before. In our implementation in the simulated 
mode on the VAX 11/780 VMS system, these operations occur 
concurrently with the uniprocessor switching from опе 


process to another by timeslices. 
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Figure 5.6 Simultaneous Transmission of Four Transputers 


Software  imlementation of this system will Бе 


presented in the following subsection. 
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3. OCCAM Implementation of the System 


The Delay Insertion Loop Interface in our implemen- 
tation, as shown in Figure 5.7, provides inter and intra 
communication between systems, transputers and processes. It 
also provides a fault detection (error check) feature during 
communications, using watchdog timers and acknowledging 


techniques. 


OELAY INSERTION LOOP INTERFACE 





Figure 5227 Implementation of Delay Insertion Loop 


The Delay Insertion Loop Interface Process (Appendix 
À) accepts inputs from links or software (OCCAM) channels. 
АЕ proper instances in time, the Delay Insertion Loop 
Interface Process sends messages through the transputer 
links or OCCAM channels. Three different communication 
types are achieved by D.I.Loop.Interface. These are by-pass 
(from outer transputer to outer transputer), internal disri- 


bution (from outer transputer to inner process) and external 


distribution (from inner process to outer transputer). More 
briefly, D.I.Loop.Interface listens to link and user 
channel, determines the communication type, transmits and 


receives the message through the channel or link. 
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a. Message Format 


A decimally coded message block communication 


БЕСЕССОІП із пзес to obtain efficiency. For simplicity, there 


is no data passed in the message block. The message block 
includes the message type, destination transputer number, 
source transputer number, process number and message 


responses (error, receive etc.). This coded message block is 
accepted in coded form by the loop interface system and is 
used by the same system to determine: what should be done 
wrth it. The coded message block has been named as CODE in 
the implementation. The CODE is a binary 32 bit two's 
complement word interpreted as a decimal integer in the 
range -2147483648 to 2147483647. Only positive values are 
used. The first digit of the ten-digit code represents the 
message type. This digit is not used in our implementation, 
ШИЕ ІСКЕ can be used for priority (0,1,2). The next three 
two-digit groups are used to show destination transputer 
number, source transputer number and process number respec- 
prvely. Each value ranges between zero and 99, so 100 
transputer addresses and 100 processes can be used in this 


System aS a maximum. Table 14 shows the CODE and its digits. 


b. The Algorithm of the System 


D.I.Loop.Interface process (procedure) (Appendix 


пас ап input channel IN, an output channel OUT and corre- 


Sponding transputer number TR.NO as formal parameters. Іп 
local declarations,  TRANSMISSION.REQUEST and SENDER. CHANNEL 
as local channels, TIMEOUT as a constant (one circulation 


period), SWITCH.POSITION, CODE;  DEST.TR.NO, SOURCE.TR.NO, 
HSNO, CLOCK, FULL, BUFFERI and BUFFER2 are declared as 
local variables. 

There are two processes (procedures) in 
D.I.Loop.Interface, CODE.GENERATOR and DECODER. 
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TABLE 14 
CODE AND ITS DICHI, 


2147483647 


ШЕ уре 1,29 


code digits xddsspprrr : СЕЗДІ trono 
process no 
responses 





CODE.GENERATOR process generates the CODE interpreted as 10 
decimal digits which includes рЕ5Т.ТВ.МО (Фезгілпаб HM 
Transputer Number),  SOURCE.TR.NO (Source Transputer Number) 
and PR.NO (User Process No to be executed). The DECODER 
process decodes the CODE by using division and remainder 
operations, and determines destination, source transputer 
number and user process number. , 

The general system is implemented in an infinite 
loop by using WHILE TRUE: The variable FULL is initialized 
with FALSE, CLOCK with NOW function and SWITCH.POSITION with 
17 (әгісеп ааа тоса The variable SWITCH.POSTITOEN 
will simulate the positions of the switch and provide the 
states. Three states, WHILE  switch.position-l, WHILE 
switch.position-2 and WHILE switch.position=3 аге ехесш 
sequentially being in SEQ construct. 

In State 1 ('WHILE switch.position=l1'), three 
operations or three ALT (alternative) constructs execute in 


parallel in PAR construct. In the first ALT (this pos 


represents the receiver), there are two guards, first is an 
input and the second is a special wait statement. In the 
first guard, ‘in? code’, if there is an incoming message 
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(КОШЕ О input channel IN, it is decoded by DECODER, then 
it і5 checked for destination address. ШЕЕ 
dest.tr.no-tr.no' (if message is for us) then the message 
COPD ЖЕ copied into a buffer” (BWEFFERI).  SENDER.CHANNEL 
sends user process number (PR.NO) for execution. If message 
is not for us (here, it is TRUE), OUT channel passes the 
CODE to the next transputer (By-pass, just passes the 
message without doing anything). In the second guard, there 
mama special construct ‘WATT NOW AFTER clockttimeout’'. This 
construct provides a wait function for a period of time 
(timeout is one complete message block execution time). If 


time is out (if there is no incoming message and time period 


expires), then its component process executes, a "TIME IS 
OUT. NO INCOMING MESSAGE" on the CRT screen. In the second 
ALT (alternative) construct, there are two guards. The 


МЕНЕЕ сце 15 full с SKIP , SKIP is always TRUE, if full is 
TRUE (it means BUFFER2 is full with message), then its 
component process is executed (here OUT channel sends the 
message im BUFFERZ).. The second guard is a special WAIT 
function as seen above, it provides wait time,. if TIMEOUT 
period expires (time is out), then its component process 
executes, a "TIME IS OUT" message on the CRT screen. In the 
third ALT construct, there are again two guards, one is 
Seransmisszone request ? ANY’ and the other is special WAIT 
statement as seen before. If there is a transmisSion request 
from the user, then a TRANSMISSION.REQUEST channel will have 
a signal and then its component process 'switch.position :- 
2' will be executed, зо the system will go to State 2. ТЕ 
time is out (second guard), there is no transmission request 
from the user, then the wait time period for the request 
expires and a "TIME IS OUT. NO XMISSION REQUEST" message is 
provided. 

in State 2° ( WHILE switeh.position = 2'), one 


ALT and one SEQ construct are executed in parallel. In ALT 
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part, one of two guards is ‘in ? code. If Cier M И 
incoming message during transmission, its component SEQ 
process is executed. Incoming message (CODE) is stored in 
buffer (BUFFER2), then BUFFER2 becomes full with message and 
variable FULL becomes TRUE. If there is no incoming message 
during transmission period, wait time expires (second guard 
in ALT) and then "TIME IS OUT. NO INCOMING MESSAGE ^w MEE 
displayed on the CRT Screen. in SEQ construct inside wees 
(this part represents the transmitter), the message CODE is 
generated by CODE.GENERATOR, апа OUT channel transmits the 
CODE and then SWITCH.POSITION becomes 3 immediately (the 
system goes to State 3). 

In 'WHILE switch.position = 3° (State 3) 90 


alternative constructs execute in parallel inside the PAR 


е оро деа е In the first ALT, if variable FULL is TRUE TSE 
other words if BUFFER2 is full with message, OUT channel 
sends the message from BUFFER2. Otherwise a wait time 


expires while the receiver waits for an incoming message, 
then "TIME IS OUT" message is displayed on the CRT screen. 
In the second ALT inside the PAR construct, if channel IN 
has a CODE (if there is an incoming message), then the 
message CODE is decoded by DECODER and it is checked for the 
originating (source) transputer number. ‘IF source. treme. 
tr.no' (if the transputer generated the message), then gts 
own message is removed from the loop to prevent the further 
circulation of the message. The SWITCH.POSITION becomes 1, 
and the system turns back to State 1. If the message is not 
ICS OWN, OUT channel forwards the message CODE. In the 
second guard, wait time expires if its own message is not 
received, then "TIME IS OUT" message is sent to the CRT 
Screen. If a fault tolerance system is designed, it can be 
activated here for faulty communication. 

In the main program part, actual channels Ping® 
c2 link3 and link4 are declared and four 
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D.I.Loop.Interface processes are executed in parallel inside 
the PAR construct with the corresponding transputer channels 
and the transputer numbers as actual parameters. In other 
words, four transputers are run in parallel. 

As seen in our program segment,  OCCAM provides 
very useful and effective tools for concurrent processing. 
But, in spite of the fact that our program matches 
completely with OCCAM Reference Manual [Ref. 5], the keyword 


NOW (local time function, provides present time) is not 
accepted in its syntax checker and compiler. Therefore, the 
WAIT function with МОМ and AFTER, as a watchdog timer 


Couldn't be implemented in the VAX 11/780 VMS system. 
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VI. CONCLUSIONS 


A. SUMMARY 


We have constructed a model of a Delay Insertion Loop 
interface to interconnnect clusters of computers. The 
Transputer T424, а hardware component which is becoming 
available soon (December 1985), was used in a simulated mode 
on the VAX 11/780 to construct the model of the Delay 
Insertion Loop interface. The programming language OCCAM 
which allows concurrent processes to be executed in parallel 
was used to create the model for multiprocessing іп а 
multitransputer system. 

The features of OCCAM and the capabilities of the T424 
Transputer were presented in detail. The possible structures 
with the Transputer in multiprocessing environment, the 
possible multitransputer systems and the ideas of multipro- 
cessing Wenepe xp laine. 

The Loop technology was examined and the Delay Insertion 
Loop was emphasized and some suggested configurations with 
four and sixteen transputers were made. Using OCCAM and the 
Transputer, The Delay Insertion Loop Interface was imple- 
mented for the Four-Transputer Single-Unidirectional Loop 
System using the VAX 11/780 VMS system. 


B. RESULTS AND COMMENTS 


This thesis work showed us that the Delay Insertion Loop 
Network Interface with in a multitransputer system is a very 
attractive alternative and likely to achieve high perform- 
ance in many present applications. It may become a good 
candidate for many military real time applications in the 


future. 
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The Delay Insertion Loop methodology is not limited to 
multitransputer systems, it can also be used with any multi- 
processor or multiprocessor cluster computer systems. 

The single unidirectional loop is a very simple communi- 
cation system to implement, bu it iss" subject to loop 
failure. Adding a second loop to the system сап be one of 
the practical methods to increase the reliability. 


The concurrency can be used to provide considerable 


gains in performance in many application areas. The T424 
transputer, is a new product with high performance, which 
allows concurrent processing. It is an improving hardware 
component in a new phase of the computer technology. To 


design concurrent systems is difficult even with a rela- 
tively simple hardware architecture like the T424 
iransputer. However, as limderstandine of concurrent 
processing improves, very powerful multitransputer systems 
сап be generated to support real-time processing. 

OCCAM, based on a model of concurrency, is a simple 
language in which to learn to write programs. But the 
concurrent processing is hard to understand and to imple- 
ment, especially for inexperienced people. Some expertise is 
required for the application of the concurrent system. ТЕ 
the number of processors or transputers in a system is 
increased, the software implementation becomes more complex, 
and especially the communication between transputers ог 
transputer systems will be more complicated. 

In software programming with  OCCAM, some execution 
errors such as Deadlock, Stop and Access Violation can be 
often encountered until one is familiar with the PAR 
Medrallel), ALT (alternative) constructs, and using channels 
Бырс апа output. The other constructs are so easier and 
especially the SEQ (sequential) construct provides statement 
by statement execution like the other conventional program- 


ming languages. 
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We have worked using the initial version of ОССАМ апа 
its compiler which was installed into our VAX 11/780 VMS 
System. Therefore, we have encountered some difficulties and 
problems (Such as run-time checking, Some unacceptable 
constructs and keywords like NOW, PLACEDPAR and PRIPAR in 
Syntax checking). But new updated versions of the OCCAM may 


not have such problems. 


С. SUGGESTIONS FOR FOLLOW-ON WORK 


This thesis addressed only to the implementation of the 
Delay Insertion Loop Interface with a four transp 
single-unidirectional loop system. Possible continuat Tonki. 
Е ОК тау Бе the implementation with single- 
bidirectional, bidirectional two loop or the complete loop 
System with sixteen transputers. 

In a bidirectional two loop system structure as shown in 
Figure 6.1, each transputer is connected to two neighboring 
transputers with two channels. The message traffic can flow 
in both directions, but circulating traffic in опе аттес и 
15 less complicated. The system is unaffected by a single 
loop failure.  Às seen in Figure 6.1, the loops can be named 
ODD and EVEN. Always one loop is on duty (active) while the 
other rests (passive). If a failure occurs while the active 
one runs, the spare one (which waits idly) takes over its 
job immediately. 

In regular array complete loop system as shown in Figure 
ORL there are sixteen transputers (or clusters of 
transputers) each one is connected with one link to each 
other. There are four vertical and four horizontal loops in 
the system. If one of the link fails, the corresponding loop 
is canceled and removed from the system without serious 


effects. 
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Figure 6.1 Two-Loop System with Four Transputers 


E 





Figure 6.2 16 Transputer Regular Array Complete Loop 
The multicluster shared memory system, shown in Figure 


6.3 [Ref. 14,], is also suggested for further implementation 
of Delay Insertion Loop methodology. 
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Figure 6.3 


Suggested Loop Interface 
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APPENDIX A 
DELAY INSERTION LOOP INTERFACE 


nts whe ate ate ates ates ats als als als ats alc ats ats ats alec atc atc ale ale als atc ats ats ale ale als als ale ate als ats aloate ale alec ate ale ate ate alo 


= = 9% 5% 09% 9% 9% 9% 9% 9% 9% 7% 7% FR 04% 9% 6% 06% 4% 64% 9% 4% 7% 7% 7% 7% 7% 7% 7% 4% 7% 2% 7% 7% 7% 7% 7% FRX 4% УС ey 9% 4% 

t. ale 

- қ 68 ~ 
m^ 

= = 2” 

ate als 

= =» A 4. 

ale atoats alec ate ate ate ats als alec ales ate alec ale alc ale ales ats aleaton fo ate ates a ,.... ы, ale ale ates atc alc als alc atc ate alc ale ate als ale 

=» —_ 07% 4% 6% 9% FH 79% 7% 7% 7% 6% 478 OH 4% FR FR FB FR FB 0% 4% е) 0% FR FV FD FR FW FR FR 4% 2% 7% 6% 7% 4% 9% 9% 7% 4% FRX 


CHAN screen ArT DE: 
CHAN keyboard AT 2 : 
DEF end.buffer =_-3 
VAR char. string [BYTE 512] : 


Ро 
ales ate ate ate at. WP Ot PE EP PO еъ Р Pe atc alc ate ale ate ale ale ate ale ale ate alse atc ate ale alc alc ale ate ale ale ats ale ale ale 
— -- Ar AN AN AN 9% 4% 9% 2% 4% 7% 6% 9% 4% 9% 7% 7% 4% 2% 4% 9% 99 7% У% 9% 9% 7% 7% 7% 7% 7% 7% 7% 7% 7% 4% 7% 9% ғ% GY FY FB 
m m 
= =» ^s ae 
ate m 
-- * WRITING х 
ate ats 
— m ГАЈ 2% 
ate ate ates totes ы, ыы „ы, ale ats ats ats als ats als als ale atic als ats ate atec ale alec als ats ate ale als ate ы utut uaa а 
— “- 9% 7% 7% 9% 7% 7% 9% 4% 4% 4% 49% 9% 9% 0% 7% 7% 9% 47% FX >“ 0% 5% 7% 0% 4% 4% 4% 4% 9% 7*5 Ж 0% 4% 47% 69% 4% 4% "< 2% 4% 4% 


m write.screen (VALUE string[]) 


ва = | 1 РОВ stripe | ВҮТЕ 0] ] 
Screen ! atom d 
Screen ! end.b er 


ala alo nta alo ata ahs whe als ats whe ales ates ates ale ates ale ats ats als als als als alc als ates als ale ats als ale ats ales ats als ate ate ats als ate ata ate 
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= = ee ee 
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ә -- 90% 06% 7% 9% 9% FR FRX 9% 4% 7% 7% 9% 9% FB FR FB ge 07% 4% 9% 47% 9% 9% 7% 4% 7% 7% 7% 4% 9% 7% 7% 47% 6% 4% 4% % FRX 7% 7% 4% 


BROC D.1.Loop.Interface ( CHAN in,out, VALUE tr.no 
CHAN transmission. request, sender.channel : 
DEF timeout - 


VAR switch. position, Godesxdespetrninb, source.tr.no, 


pr.no, clock, full, с buffer2 : 


ale ale ate ate ales ale ale ale als ales alsate alaates ales ale alaales ale ats ale ale ate als ats ale ate als atealts ate ate ale ats ate ate aleate ata atecale 
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— - aP o9 ab Ta 
> = 9% FC FU FR FB FR FH FR FB FB FV GB MB FH FY FR FB FB FB GV FV FB FB 08S %е FH FY SH FX FRY GR GY FR FB FR GB FR 0289 28 28. А 
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Pat 
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ELS ale 
æ . ee 
жо шо Pat ^. 
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= = 4% 49 4% 4% % 4% 4% 4% 4% 4% 0S 92% 4% 4% 4% 9% 4% 4% 9% 2% 4% Ф 4% 9% 2% 6% 4% 4% 9% S 4% 4% FS 4% 2% 2% 4% 9% 4S 48 78 


PROC Decoder 
SE 


est.tr.no = code/10000000 
source.tr.no :- а 100000, Ио 
prono = (code\100000)/1000 
DE. TRUE | -- infinite Pome 
ull- := PALSE -- buffer2 is empty 
* —elock = Ие -- present time 
Switch. posHiILones al -- switch is initially 2M 
a г STATIE Т 


2: 
x 2% 2% 2% 2% 4% 9% 2% 2% 2% 2% 2% 2% 2% 22 wg РА "d oe e e ve 


WHILE switch.position ~= L -- State 1 
PAR 
ALT | Р 
i code -- incoming message 
саре -- decode the message 
dest.tr.noos IE WO -- if message is for us 
bufferl := code -- copy the mem 
sender.channel ! pr.no --send to execu 
OUI EE SENE -- acnowledge signal 
LN -- if the message is not for us 


ode == ag use Boe ЕНЕ ES 
WAIT NOW. АРТЕК" clock t 
write.screen ("TIME 15 "QUT. NO INCOMING "MESSAGE" ) 


T 
ЕЛ 5 КЕР -- if buffer2 15 БО 
cüt- ! buffer2 [ee the message in buffer2 
WAIT NOW AFTER clock * timeou -- ii time is out 
NT write.screen ("TIME IS OUT 5 
о request ? ANY -- xmission request ? 
itch.position := 2 -- on e [aue 2 


WAIT NOW FIER clock ғ Еее me out 
write.screen ("TIME IS OUT.NO KMISSTON "REQUEST ) 
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fo eas na nl no ns nas n n a nts no n anta n УИ УМ УС УМ УЧ УЧ УЧ УЧ УЧ УЧ ИЧ ЕН РЧ М УЧ УҢ ЕЧ, ао РЕН РЕН РА РЕҢ ЕҢ ЕҢ ЕҢ ЕҢ,“ 
= ==» GS OS 2€ e 7S V я ж“ 9% 9% 4% 9% Л 4% 9% 9% 9% © 7% 7% 2% 4% 9% 7% FS 7% FR FY FV FR Ge FY FR FR FR FB FR FR FY 


-- * STATE 2 js 
WHILE switch.position = 2 -- State 2 
PAR 
ALT | 
in ? code -- ошо message during xmission ? 
-- if there is an incoming message 
buffer? := code -- Store in buffer2 
tur] TRU -- ры 15 full aoe cae 
WAIT NOW AFTER clock + time LI me out 
write.screen ("TIME IS OUT. "NO INCOMING "MESSAGE ) 
ode.Generator -- generate message for xmission 
out ! code | -- transmit the message 
switch.position := 3 then turn back to State 3 
E^ x STATE 3 2 
WHILE switch.position = 3 -- State 3 
PAR 
ALT 
full & SKIP -- if buffer2 is full with message 
out ! bu -- send the message in buffer2 
WAIT NOW AFTER clock * timeout Е ле 15 out 
write.screen ( TIME IS OUT’) 
ALT 
in ? code -- if there 1S incoming message 
е Чет -- decode the message 
ео - tr.no -- if it is own message 
code :- 0 -- remove the message from Eon 
switch. position := 1 -- return to State 
EE - if it is not own message 
ode -- just pass the message 
WAIT NOW. AFTER clock + timeout -- if time is out 
write.screen(' TIME IS OUT") 
EC MAIN PROGRAM B 


КЕТА Ltink2, link3, link4 : 
PAR 


D I.LCoop.Interface (link4, linkl, 1 
ЕТ. .Loop.Interface linkl, link2, 2 
Dei. 


Loop.Interface (link2, link3, 3 
Loop.Interface (link3, link4; 
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